{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第四周 聚类"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据events they’ve responded to in the past user demographic information what events they’ve seen and clicked on in our app用户对某个事件是否感兴趣\n",
    "活动描述信息在events.csv文件：共110维特征.前9列：event_id, user_id, start_time, city, state, zip, country, lat, and lng.\n",
    "event_id：活动的id, \n",
    "user_id：创建活动的用户的id .  \n",
    "city, state, zip, and country： 活动地点 (如果知道的话).\n",
    "lat and lng： floats（活动地点的经度和纬度）\n",
    "start_time： 字符串，ISO-8601 UTC time，表示活动开始时间\n",
    "后101列为词频：count_1, count_2, ..., count_100，count_other\n",
    "count_N：活动描述出现第N个词的次数\n",
    "count_other：除了最常用的100个词之外的其余词出现的次数\n",
    "\n",
    "作业要求：\n",
    "根据活动的关键词（count_1, count_2, ..., count_100，count_other属性）做聚类，可采用KMeans聚类\n",
    "尝试K=10，20，30，..., 100, 并计算各自CH_scores。\n",
    "\n",
    "提示：由于样本数目较多，建议使用MiniBatchKMeans。\n",
    "文件说明：\n",
    "1.\t可以先运行0. EDA.ipynb，看一下竞赛所有数据的情况；\n",
    "2.\t总体活动的数目太多（300w+记录），可以只需对训练集train.csv和测试集test.cv出现的活动（13418条记录）举行聚类即可。运行1. Users_Events.ipynb可得到只在训练集train.csv和测试集test.csv出现的活动，可自己修改代码存为csv格式，再进行聚类。\n",
    "批改标准\n",
    "1. 抽取出只在训练集和测试集中出现的event：20分 \n",
    "2. 聚类 ：40分 \n",
    "3. CH_scores计算：20分 \n",
    "4. 结果显示/分析：20分\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>event_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>start_time</th>\n",
       "      <th>city</th>\n",
       "      <th>state</th>\n",
       "      <th>zip</th>\n",
       "      <th>country</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>c_1</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>684921758</td>\n",
       "      <td>3647864012</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>244999119</td>\n",
       "      <td>3476440521</td>\n",
       "      <td>2012-11-03T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3928440935</td>\n",
       "      <td>517514445</td>\n",
       "      <td>2012-11-05T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2582345152</td>\n",
       "      <td>781585781</td>\n",
       "      <td>2012-10-30T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1051165850</td>\n",
       "      <td>1016098580</td>\n",
       "      <td>2012-09-27T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 110 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     event_id     user_id                start_time city state  zip country  \\\n",
       "0   684921758  3647864012  2012-10-31T00:00:00.001Z  NaN   NaN  NaN     NaN   \n",
       "1   244999119  3476440521  2012-11-03T00:00:00.001Z  NaN   NaN  NaN     NaN   \n",
       "2  3928440935   517514445  2012-11-05T00:00:00.001Z  NaN   NaN  NaN     NaN   \n",
       "3  2582345152   781585781  2012-10-30T00:00:00.001Z  NaN   NaN  NaN     NaN   \n",
       "4  1051165850  1016098580  2012-09-27T00:00:00.001Z  NaN   NaN  NaN     NaN   \n",
       "\n",
       "   lat  lng  c_1   ...     c_92  c_93  c_94  c_95  c_96  c_97  c_98  c_99  \\\n",
       "0  NaN  NaN    2   ...        0     1     0     0     0     0     0     0   \n",
       "1  NaN  NaN    2   ...        0     0     0     0     0     0     0     0   \n",
       "2  NaN  NaN    0   ...        0     0     0     0     0     0     0     0   \n",
       "3  NaN  NaN    1   ...        0     0     0     0     0     0     0     0   \n",
       "4  NaN  NaN    1   ...        0     0     0     0     0     0     0     0   \n",
       "\n",
       "   c_100  c_other  \n",
       "0      0        9  \n",
       "1      0        7  \n",
       "2      0       12  \n",
       "3      0        8  \n",
       "4      0        9  \n",
       "\n",
       "[5 rows x 110 columns]"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "#读取数据\n",
    "events = pd.read_csv(\"events.csv\")#大啊，光读取数据都卡死。\n",
    "events.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1.找出只在训练集和测试集中出现的event集合，并重新编制索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle    #在python2.7中为cPickle\n",
    "\n",
    "import itertools\n",
    "\n",
    "#处理事件字符串\n",
    "import datetime\n",
    "\n",
    "import numpy as np\n",
    "import scipy.io as sio\n",
    "import scipy.sparse as ss\n",
    "\n",
    "#相似度/距离\n",
    "import scipy.spatial.distance as ssd\n",
    "\n",
    "from collections import defaultdict\n",
    "from sklearn.preprocessing import normalize"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number of uniqueEvents :13418\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "我们只关心train和test中出现的event，因此重点处理这部分关联数据\n",
    "train.csv 有6列：\n",
    "user：用户ID\n",
    "event：活动ID\n",
    "invited：是否被邀请（0/1）\n",
    "timestamp：ISO-8601 UTC格式时间字符串，表示用户看到该活动的时间\n",
    "interested, and not_interested\n",
    "test.csv 除了没有interested, and not_interested，其余列与train相同\n",
    " \"\"\"\n",
    "    \n",
    "# 统计训练集中有多少不同的events\n",
    "uniqueUsers = set()\n",
    "uniqueEvents = set()\n",
    "\n",
    "#倒排表\n",
    "#统计(每个用户参加的活动)/ 每个活动参加的用户\n",
    "eventsForUser = defaultdict(set)\n",
    "usersForEvent = defaultdict(set)\n",
    "    \n",
    "for filename in [\"train.csv\", \"test.csv\"]:\n",
    "    f = open(filename, 'r')   #这里不用rb的原因是，在python3.X中会出现“a bytes-like object is required, not 'str'”错误\n",
    "    \n",
    "    #忽略第一行（列名字）\n",
    "    f.readline().strip().split(\",\")\n",
    "    \n",
    "    for line in f:    #对每条记录\n",
    "        cols = line.strip().split(\",\")\n",
    "        uniqueUsers.add(cols[0])   #第一列为用户ID\n",
    "        uniqueEvents.add(cols[1])   #第二列为活动ID\n",
    "#        uniqueEvents.add(cols[0])           \n",
    "        #eventsForUser[cols[0]].add(cols[1])    #该用户参加了这个活动\n",
    "        #usersForEvent[cols[1]].add(cols[0])    #该活动被用户参加\n",
    "    f.close()\n",
    "\n",
    "\n",
    "n_uniqueUsers = len(uniqueUsers)\n",
    "n_uniqueEvents = len(uniqueEvents)\n",
    "\n",
    "#print(\"number of uniqueUsers :%d\" % n_uniqueUsers)\n",
    "print(\"number of uniqueEvents :%d\" % n_uniqueEvents)\n",
    "\n",
    "#用户关系矩阵表，可用于后续LFM/SVD++处理的输入\n",
    "#这是一个稀疏矩阵，记录用户对活动感兴趣\n",
    "'''userEventScores = ss.dok_matrix((n_uniqueUsers, n_uniqueEvents))\n",
    "userIndex = dict()'''\n",
    "eventIndex = dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "#重新编码用户索引字典\n",
    "#for i, u in enumerate(uniqueUsers):\n",
    "#    userIndex[u] = i\n",
    "    \n",
    "#重新编制活动索引字典    \n",
    "for i, e in enumerate(uniqueEvents):\n",
    "    eventIndex[e] = i\n",
    "'''\n",
    "n_records = 0\n",
    "ftrain = open(\"train.csv\", 'r')   #这里不用rb的原因是，在python3.X中会出现“a bytes-like object is required, not 'str'”错误\n",
    "ftrain.readline()\n",
    "for line in ftrain:\n",
    "    cols = line.strip().split(\",\")\n",
    "    i = userIndex[cols[0]]  #用户\n",
    "    j = eventIndex[cols[1]] #活动\n",
    "        \n",
    "    eventsForUser[i].add(j)    #该用户参加了这个活动\n",
    "    usersForEvent[j].add(i)    #该活动被用户参加\n",
    "        \n",
    "    #userEventScores[i, j] = int(cols[4]) - int(cols[5])   #interested - not_interested\n",
    "    score = int(cols[4])\n",
    "    #if score == 0:  #0在稀疏矩阵中表示该元素不存在，因此借用-1表示interested=0\n",
    "    #userEventScores[i, j] = -1\n",
    "    #else:\n",
    "    userEventScores[i, j] = score\n",
    "ftrain.close()\n",
    "\n",
    "  \n",
    "##统计每个用户参加的活动，后续用于将用户朋友参加的活动影响到用户\n",
    "pickle.dump(eventsForUser, open(\"PE_eventsForUser.pkl\", 'wb'))\n",
    "##统计活动参加的用户\n",
    "pickle.dump(usersForEvent, open(\"PE_usersForEvent.pkl\", 'wb'))\n",
    "\n",
    "#保存用户-活动关系矩阵R，以备后用\n",
    "sio.mmwrite(\"PE_userEventScores\", userEventScores)\n",
    "\n",
    "\n",
    "#保存用户索引表\n",
    "pickle.dump(userIndex, open(\"PE_userIndex.pkl\", 'wb'))\n",
    "#保存活动索引表\n",
    "pickle.dump(eventIndex, open(\"PE_eventIndex.pkl\", 'wb'))\n",
    "\n",
    "    \n",
    "# 为了防止不必要的计算，我们找出来所有关联的用户 或者 关联的event\n",
    "# 所谓的关联用户，指的是至少在同一个event上有行为的用户pair\n",
    "# 关联的event指的是至少同一个user有行为的event pair\n",
    "uniqueUserPairs = set()\n",
    "uniqueEventPairs = set()\n",
    "for event in uniqueEvents:\n",
    "    i = eventIndex[event]\n",
    "    users = usersForEvent[i]\n",
    "    if len(users) > 2:\n",
    "        uniqueUserPairs.update(itertools.combinations(users, 2))\n",
    "        \n",
    "for user in uniqueUsers:\n",
    "    u = userIndex[user]\n",
    "    events = eventsForUser[u]\n",
    "    if len(events) > 2:\n",
    "        uniqueEventPairs.update(itertools.combinations(events, 2))\n",
    " \n",
    "#保存用户-事件关系对索引表\n",
    "#pickle.dump(uniqueUserPairs, open(\"FE_uniqueUserPairs.pkl\", 'wb'))\n",
    "pickle.dump(uniqueEventPairs, open(\"PE_uniqueEventPairs.pkl\", 'wb'))'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "抽取events.csv中出现在训练和测试集中的样本。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<function BufferedReader.close>"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "f=open('events.csv','rb')\n",
    "columns=f.readline().strip().decode('utf-8').split(\",\")\n",
    "train_data=pd.DataFrame(columns=columns)\n",
    "i=0\n",
    "for line in f:\n",
    "    cols=line.strip().decode('utf-8').split(\",\")\n",
    "    if cols[0] in uniqueEvents:\n",
    "        train_data.loc[i]=cols\n",
    "        i+=1\n",
    "f.close"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "pickle.dump(train_data, open(\"train_data.csv\", 'wb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2.聚类\n",
    "根据活动的关键词（count_1, count_2, ..., count_100，count_other属性）做聚类，采用KMeans"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入必要的工具包\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.cluster import MiniBatchKMeans\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn import metrics\n",
    "\n",
    "from sklearn.decomposition import PCA\n",
    "import time\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "#读取训练数据\n",
    "train = pd.read_csv('train_data.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>event_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>start_time</th>\n",
       "      <th>city</th>\n",
       "      <th>state</th>\n",
       "      <th>zip</th>\n",
       "      <th>country</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>684921758</td>\n",
       "      <td>3647864012</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>244999119</td>\n",
       "      <td>3476440521</td>\n",
       "      <td>2012-11-03T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3928440935</td>\n",
       "      <td>517514445</td>\n",
       "      <td>2012-11-05T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>2582345152</td>\n",
       "      <td>781585781</td>\n",
       "      <td>2012-10-30T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>1051165850</td>\n",
       "      <td>1016098580</td>\n",
       "      <td>2012-09-27T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 111 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0    event_id     user_id                start_time city state  \\\n",
       "0           0   684921758  3647864012  2012-10-31T00:00:00.001Z  NaN   NaN   \n",
       "1           1   244999119  3476440521  2012-11-03T00:00:00.001Z  NaN   NaN   \n",
       "2           2  3928440935   517514445  2012-11-05T00:00:00.001Z  NaN   NaN   \n",
       "3           3  2582345152   781585781  2012-10-30T00:00:00.001Z  NaN   NaN   \n",
       "4           4  1051165850  1016098580  2012-09-27T00:00:00.001Z  NaN   NaN   \n",
       "\n",
       "   zip country  lat  lng   ...     c_92  c_93  c_94  c_95  c_96  c_97  c_98  \\\n",
       "0  NaN     NaN  NaN  NaN   ...        0     1     0     0     0     0     0   \n",
       "1  NaN     NaN  NaN  NaN   ...        0     0     0     0     0     0     0   \n",
       "2  NaN     NaN  NaN  NaN   ...        0     0     0     0     0     0     0   \n",
       "3  NaN     NaN  NaN  NaN   ...        0     0     0     0     0     0     0   \n",
       "4  NaN     NaN  NaN  NaN   ...        0     0     0     0     0     0     0   \n",
       "\n",
       "   c_99  c_100  c_other  \n",
       "0     0      0        9  \n",
       "1     0      0        7  \n",
       "2     0      0       12  \n",
       "3     0      0        8  \n",
       "4     0      0        9  \n",
       "\n",
       "[5 rows x 111 columns]"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>event_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>start_time</th>\n",
       "      <th>city</th>\n",
       "      <th>state</th>\n",
       "      <th>zip</th>\n",
       "      <th>country</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13413</th>\n",
       "      <td>13413</td>\n",
       "      <td>1889561284</td>\n",
       "      <td>318112591</td>\n",
       "      <td>2012-10-30T07:00:00.003Z</td>\n",
       "      <td>West Covina</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>34.028</td>\n",
       "      <td>-117.910</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13414</th>\n",
       "      <td>13414</td>\n",
       "      <td>2738205241</td>\n",
       "      <td>187138671</td>\n",
       "      <td>2012-11-10T03:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13415</th>\n",
       "      <td>13415</td>\n",
       "      <td>3409015015</td>\n",
       "      <td>1019195677</td>\n",
       "      <td>2012-09-28T21:00:00.003Z</td>\n",
       "      <td>Kitchener</td>\n",
       "      <td>ON</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Canada</td>\n",
       "      <td>43.428</td>\n",
       "      <td>-80.434</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13416</th>\n",
       "      <td>13416</td>\n",
       "      <td>3119357029</td>\n",
       "      <td>3318624521</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>London</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>51.481</td>\n",
       "      <td>-0.191</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13417</th>\n",
       "      <td>13417</td>\n",
       "      <td>2736696425</td>\n",
       "      <td>3264288794</td>\n",
       "      <td>2012-10-26T03:00:00.003Z</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>34.041</td>\n",
       "      <td>-118.259</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>132</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 111 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Unnamed: 0    event_id     user_id                start_time  \\\n",
       "13413       13413  1889561284   318112591  2012-10-30T07:00:00.003Z   \n",
       "13414       13414  2738205241   187138671  2012-11-10T03:00:00.003Z   \n",
       "13415       13415  3409015015  1019195677  2012-09-28T21:00:00.003Z   \n",
       "13416       13416  3119357029  3318624521  2012-10-31T00:00:00.001Z   \n",
       "13417       13417  2736696425  3264288794  2012-10-26T03:00:00.003Z   \n",
       "\n",
       "              city state  zip         country     lat      lng   ...     c_92  \\\n",
       "13413  West Covina    CA  NaN   United States  34.028 -117.910   ...        0   \n",
       "13414          NaN   NaN  NaN             NaN     NaN      NaN   ...        0   \n",
       "13415    Kitchener    ON  NaN          Canada  43.428  -80.434   ...        0   \n",
       "13416       London   NaN  NaN  United Kingdom  51.481   -0.191   ...        0   \n",
       "13417  Los Angeles    CA  NaN   United States  34.041 -118.259   ...        0   \n",
       "\n",
       "       c_93  c_94  c_95  c_96  c_97  c_98  c_99  c_100  c_other  \n",
       "13413     0     0     0     0     0     0     0      0        4  \n",
       "13414     0     0     0     0     0     0     0      0        7  \n",
       "13415     0     0     0     0     0     0     0      0       10  \n",
       "13416     0     0     0     0     0     0     0      0        6  \n",
       "13417     0     0     0     0     0     0     0      0      132  \n",
       "\n",
       "[5 rows x 111 columns]"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "c_others的词频明显比其他的多些，应该是其本身决定的。可能会对结果产生较大影响，怎么处理没想好。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 13418 entries, 0 to 13417\n",
      "Columns: 111 entries, Unnamed: 0 to c_other\n",
      "dtypes: float64(2), int64(104), object(5)\n",
      "memory usage: 11.4+ MB\n"
     ]
    }
   ],
   "source": [
    "train.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>event_id</th>\n",
       "      <th>c_1</th>\n",
       "      <th>c_2</th>\n",
       "      <th>c_3</th>\n",
       "      <th>c_4</th>\n",
       "      <th>c_5</th>\n",
       "      <th>c_6</th>\n",
       "      <th>c_7</th>\n",
       "      <th>c_8</th>\n",
       "      <th>c_9</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13413</th>\n",
       "      <td>1889561284</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13414</th>\n",
       "      <td>2738205241</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13415</th>\n",
       "      <td>3409015015</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13416</th>\n",
       "      <td>3119357029</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13417</th>\n",
       "      <td>2736696425</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>132</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 102 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         event_id  c_1  c_2  c_3  c_4  c_5  c_6  c_7  c_8  c_9   ...     c_92  \\\n",
       "13413  1889561284    0    0    0    0    0    0    0    0    0   ...        0   \n",
       "13414  2738205241    0    0    0    0    0    0    0    0    0   ...        0   \n",
       "13415  3409015015    0    1    0    0    0    0    0    0    1   ...        0   \n",
       "13416  3119357029    0    0    0    0    0    0    0    0    0   ...        0   \n",
       "13417  2736696425    6    1    2    0    3   14    0    0    1   ...        0   \n",
       "\n",
       "       c_93  c_94  c_95  c_96  c_97  c_98  c_99  c_100  c_other  \n",
       "13413     0     0     0     0     0     0     0      0        4  \n",
       "13414     0     0     0     0     0     0     0      0        7  \n",
       "13415     0     0     0     0     0     0     0      0       10  \n",
       "13416     0     0     0     0     0     0     0      0        6  \n",
       "13417     0     0     0     0     0     0     0      0      132  \n",
       "\n",
       "[5 rows x 102 columns]"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n_trains = 13418\n",
    "y_train = train.event_id#.values[:n_trains]\n",
    "X_train = train.drop([\"Unnamed: 0\",\"user_id\",\"start_time\",\"city\",\"state\",\"zip\",\"country\",\"lat\",\"lng\"],axis=1)#.values[:n_trains]\n",
    "X_train.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the shape of train_image: (13418, 102)\n"
     ]
    }
   ],
   "source": [
    "# 原始输入的特征维数和样本数目\n",
    "print('the shape of train_image: {}'.format(X_train.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     684921758\n",
       "1     244999119\n",
       "2    3928440935\n",
       "3    2582345152\n",
       "4    1051165850\n",
       "Name: event_id, dtype: int64"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "各维度相关性比较小，不需要降维。\n",
    "下面这段定义聚类并计算各自CH_scores的函数 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 一个参数点（聚类数据为K）的模型\n",
    "def K_cluster_analysis(K, X_train, y_train):\n",
    "    start = time.time()\n",
    "    \n",
    "    print(\"K-means begin with clusters: {}\".format(K));\n",
    "    \n",
    "    #K-means,在训练集上训练\n",
    "    mb_kmeans = MiniBatchKMeans(n_clusters = K)#样本数目较多，建议使用MiniBatchKMeans\n",
    "    mb_kmeans.fit(X_train)\n",
    "    \n",
    "    # 在训练集和测试集上测试\n",
    "    #y_train_pred = mb_kmeans.fit_predict(X_train)\n",
    "    #y_val_pred = mb_kmeans.predict(X_val)\n",
    "    \n",
    "    #以前两维特征打印训练数据的分类结果\n",
    "    #plt.scatter(X_train[:, 0], X_train[:, 1], c=y_pred)\n",
    "    #plt.show()\n",
    "\n",
    "    # K值的评估标准\n",
    "    #常见的方法有轮廓系数Silhouette Coefficient和Calinski-Harabasz Index\n",
    "    #这两个分数值越大则聚类效果越好\n",
    "    #CH_score = metrics.calinski_harabaz_score(X_train,mb_kmeans.predict(X_train))\n",
    "    CH_score = metrics.silhouette_score(X_train,mb_kmeans.predict(X_train))\n",
    "    \n",
    "    #也可以在校验集上评估K\n",
    "    #v_score = metrics.v_measure_score(y_val, y_val_pred)\n",
    "    \n",
    "    end = time.time()\n",
    "    print(\"CH_score: {}, time elaps:{}\".format(CH_score, int(end-start)))\n",
    "   # print(\"v_score: {}\".format(v_score))\n",
    "    \n",
    "    return CH_score#,v_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.5339076597061969, time elaps:68\n",
      "K-means begin with clusters: 20\n",
      "CH_score: 0.5294858362298163, time elaps:27\n",
      "K-means begin with clusters: 30\n",
      "CH_score: 0.523004781124343, time elaps:22\n",
      "K-means begin with clusters: 40\n",
      "CH_score: 0.5234772721421301, time elaps:18\n",
      "K-means begin with clusters: 50\n",
      "CH_score: 0.5236316548195806, time elaps:20\n",
      "K-means begin with clusters: 60\n",
      "CH_score: 0.526534721695752, time elaps:28\n",
      "K-means begin with clusters: 70\n",
      "CH_score: 0.5258365756108925, time elaps:26\n",
      "K-means begin with clusters: 80\n",
      "CH_score: 0.5171691067383565, time elaps:21\n",
      "K-means begin with clusters: 90\n",
      "CH_score: 0.5195090538455147, time elaps:21\n",
      "K-means begin with clusters: 100\n",
      "CH_score: 0.5194612795265671, time elaps:24\n"
     ]
    }
   ],
   "source": [
    "# 设置超参数（聚类数目K）搜索范围\n",
    "Ks = [10,20,30,40,50,60,70,80,90,100]\n",
    "CH_scores = []\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K, X_train, y_train)\n",
    "    CH_scores.append(ch)\n",
    "    #v_scores.append(v)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x2cac3f59160>]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAD8CAYAAABpcuN4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xt4VNW5x/HvyyUqN7UQEYGKrVTBlqJGilLRVq2gCLYkCvWIWClWpUjUU229VGk93it4Acul3tqjKCriFQulQC0qQQWNiFzUgqBEEJWDAiHv+WPtmCGGZAJJ9lx+n+eZh5m19+x5Zxznl732XmubuyMiIrIzjeIuQEREUpuCQkREqqWgEBGRaikoRESkWgoKERGploJCRESqpaAQEZFqKShERKRaCgoREalWk7gLqAtt2rTxTp06xV2GiEhaWbhw4cfunlvTehkRFJ06daKoqCjuMkRE0oqZvZ/Meup6EhGRaikoRESkWgoKERGploJCRESqpaAQEZFqKShERKRaCgoREalWdgfFqlUwahRs2xZ3JSIiKSu7g2LhQhg7Fm65Je5KRERSVnYHxemnQ0EBXHcdvP123NWIiKSk7A4KgDvvhObNYdgwKCuLuxoRkZSjoGjbFsaMgRdfhPHj465GRCTlKCgAzj4bTj4ZrrgC3k9qjiwRkayhoAAwgz//GdzhV78K/4qICKCgqHDggXDDDfD88/DXv8ZdjYhIylBQJLrwQjjmmDC2Yt26uKsREUkJCopEjRvDpEmwaROMHBl3NSIiKUFBUVmXLnD11TBlCjz5ZNzViIjETkFRld/8Brp1C11RGzfGXY2ISKwUFFXJyYHJk+HDD0NoiIhkMQXFzuTlwaWXwsSJMHt23NWIiMQmqaAwsz5mttTMlpvZFVUsH2pmJWb2enQbFrUfaGYLo7ZiM/tV1N7MzJ4xs7ej9htr2lYsrr0WDj4YfvlL2Lw5tjJEROJUY1CYWWPgbqAv0BUYbGZdq1h1irt3j26Tora1wDHu3h34AXCFmR0QLbvV3Q8FDgd6mVnfGrbV8Jo1C3sUK1bA738fWxkiInFKZo+iB7Dc3Ve6+1bgYWBAMht3963uviV6uEf567n7ZnefXb4O8CrQobbFN4jjj4fhw+FPf4IFC+KuRkSkwSUTFO2BVQmPV0dtlQ00s8VmNtXMOpY3mllHM1scbeMmd1+T+CQz2wc4DZhV07Zic/PNsP/+cN55sHVr3NWIiDSoZILCqmirPBnSU0And+8GzATu/2pF91VR+8HAOWbW9qsNmzUBHgLucPeVNW1rh6LMhptZkZkVlZSUJPE2dsPee8M998Abb8BNN9Xva4mIpJhkgmI1kPhXfQdgh70Cd1+f0MU0ETiy8kaiPYli4NiE5gnAMncfU5ttRetNcPc8d8/Lzc1N4m3sptNOg0GD4I9/hLfeqv/XExFJEckExQKgs5kdZGY5wCBgeuIKZtYu4WF/YEnU3sHM9oru7wv0ApZGj/8I7A2MSmZbKWHsWGjRIlzkaPv2uKsREWkQNQaFu5cCI4AZhB/tR9y92MxGm1n/aLWR0Wmui4CRwNCovQvwctQ+h3Cm0xtm1gG4knAW1auVToPd2bbit99+ISzmz4e77467GhGRBmGeAddeyMvL86KiooZ5MXc49VSYMweKi6FTp4Z5XRGROmZmC909r6b1NDK7tszCge1GjcJpsxkQtCIi1VFQ7IpvfhNuvBH+/nd44IG4qxERqVcKil11wQXQqxcUFobJA0VEMpSCYlc1ahRmmN28GX7967irERGpNwqK3XHIIWEOqKlT4Ykn4q5GRKReKCh212WXQffu4SJHn3wSdzUiInVOQbG7mjYNXVAlJfDf/x13NSIidU5BUReOOCLsWUyeDLNm1by+iEgaUVDUld//Hjp3Dhc5+r//i7saEZE6o6CoK3vtBZMmwbvvwtVXx12NiEidUVDUpd69w/iKsWPh5ZfjrkZEpE4oKOrajTfCAQfoIkcikjEUFHWtVaswF1RxMdxwQ9zViIjsNgVFfTj1VPj5z+H66+HNN+OuRkRktygo6suYMeESqrrIkYikOQVFfcnNrTiofeedcVcjIrLLFBT1afDg0A115ZWwcmXc1YiI7BIFRX0yg/HjoXFjXeRIRNKWgqK+dewIN98cpva49964qxERqTUFRUMYPjwMxrvkEli7Nu5qRERqRUHREBo1gokT4csvYcSIuKsREamVpILCzPqY2VIzW25mV1SxfKiZlZjZ69FtWNR+oJktjNqKzexXCc850szeiLZ5h5lZ1P4NM/u7mS2L/t23rt5srL7zHbjuOnj8cXjssbirERFJWo1BYWaNgbuBvkBXYLCZda1i1Snu3j26TYra1gLHuHt34AfAFWZ2QLRsPDAc6Bzd+kTtVwCz3L0zMCt6nBkuvTRMSX7RRbBhQ9zViIgkJZk9ih7Acndf6e5bgYeBAcls3N23uvuW6OEe5a9nZu2AVu4+390deAA4PVpvAHB/dP/+hPb016RJuGbFxx+H0BARSQPJBEV7YFXC49VRW2UDzWyxmU01s47ljWbW0cwWR9u4yd3XRM9fvZNttnX3tQDRv/tVVZSZDTezIjMrKikpSeJtpIju3eHyy+G+++CFF+KuRkSkRskEhVXRVnlAwFNAJ3fvBsykYo8Ad18VtR8MnGNmbZPcZrXcfYK757l7Xm5ubm2eGr+rr4ZDDoHzz4dNm+KuRkSkWskExWqgY8LjDsCaxBXcfX1CF9NE4MjKG4n2JIqBY6NtdtjJNj+KuqbKu6jWJVFjetlzz3CRo/feg6uuirsaEZFqJRMUC4DOZnaQmeUAg4DpiSuU/7BH+gNLovYOZrZXdH9foBewNOpS+tzMekZnOw0BnoyePx04J7p/TkJ7ZvnhD8NB7TvugPnz465GRGSnagwKdy8FRgAzCAHwiLsXm9loM+sfrTYyOv11ETASGBq1dwFejtrnALe6+xvRsguAScByYAXwXNR+I3CSmS0DTooeZ6YbboAOHcJFjrZsqXl9EZEYmGfA/EN5eXleVFQUdxm75rnn4JRTwnGL0aPjrkZEsoiZLXT3vJrW08jsuPXtC//1X+ESqpreQ0RSkIIiFVx7LZSWwrhxcVciIvI1CopU8O1vQ//+YUryL76IuxoRkR0oKFJFYSGsXw8PPhh3JSIiO1BQpIreveHww8O1tjPgBAMRyRwKilRhFvYqliyBGTPirkZE5CsKilRy5pnQrh3cfnvclYiIfEVBkUpycsKFjV54AYqL465GRARQUKSe88+HvfYKxypERFKAgiLVtG4NQ4aEs5/Safp0EclYCopUNGpUmPtp/Pi4KxERUVCkpEMPDVN7jBunyQJFJHYKilRVWAgffQQPPRR3JSKS5RQUqerEE+G73w2nymoAnojESEGRqszCsYrFi2H27LirEZEspqBIZWedBbm5GoAnIrFSUKSyPfeECy+Ep5+Gd96JuxoRyVIKilR3wQVhxPbYsXFXIiJZSkGR6tq2DV1Q990HGzbEXY2IZCEFRTooLITNm2HChLgrEZEslFRQmFkfM1tqZsvN7Ioqlg81sxIzez26DYvau5vZfDMrNrPFZnZmwnPmJay/xsymRe3Hm9mnCcuuqas3m7a+9z044QS46y7Yti3uakQkyzSpaQUzawzcDZwErAYWmNl0d3+r0qpT3H1EpbbNwBB3X2ZmBwALzWyGu29092MTXuMx4MmE581z93678oYyVmEh9OsHjz4KP/953NWISBZJZo+iB7Dc3Ve6+1bgYWBAMht393fcfVl0fw2wDshNXMfMWgI/BqbVpvCs07cvHHKIBuCJSINLJijaA6sSHq+O2iobGHUvTTWzjpUXmlkPIAdYUWnRT4FZ7v5ZQtvRZrbIzJ4zs8OSqDHzNWoEF18MRUXw4otxVyMiWSSZoLAq2ir/SfsU0MnduwEzgft32IBZO+BB4Fx3L6v03MFA4oRGrwIHuvv3gTvZyZ6GmQ03syIzKyrJlum4hwyBfffVADwRaVDJBMVqIHEPoQOwJnEFd1/v7uXTnE4EjixfZmatgGeAq9z9pcTnmVlrQtfWMwnb+szdN0X3nwWamlmbykW5+wR3z3P3vNzc3MqLM1Pz5vCrX8G0afDuu3FXIyJZIpmgWAB0NrODzCwHGARMT1wh2mMo1x9YErXnAE8AD7j7o1VsuwB42t2/TNjW/mZm0f0eUY3rk39LGe6ii0I31B13xF2JiGSJGoPC3UuBEcAMQgA84u7FZjbazPpHq42MToFdBIwEhkbtZwC9gaEJp7t2T9j8IHbsdgLIB96MtnUHMMhdR2+/0r49nHkmTJ4Mn34adzUikgUsE36D8/LyvKioKO4yGs7ChZCXB7fdBpdcEnc1IpKmzGyhu+fVtJ5GZqejI4+EY48N3U+lpXFXIyIZTkGRrgoL4f33w4FtEZF6pKBIV/37w7e+pVNlRaTeKSjSVePGMHIk/Pvf8MorcVcjIhlMQZHOfvELaNVKexUiUq8UFOmsZUsYNixMFLhqVc3rS/rZsgUeeQReekkzB0tsFBTpbuTIMEngXXfFXYnUtS1bYODAMG7m6KNhn33gpJPgD3+AuXPhyy9r3oZIHdA4ikxwxhnw97+HvYoWLeKuRurC1q2Qnw9PPQVjxoSBlnPnwpw58MYb4Y+DnBz4wQ/guOOgd2845pgwzYtIkpIdR6GgyATz54cfiTvvhBGVLwkiaWfbthD+06bBuHHhuumJNmwIMwjPmRPC49VXYft2aNIkjLEpD45evcJeiMhOKCiyTc+esH49LF0a5oKS9LRtGwweDI89lnzwf/55OPutfI/jlVfCdsyge/cQGuW3Nl+bX1OymIIi20yZAoMGwZNPhjEWkn5KS8PVCx99NJzJNmrUrm3niy/Cwe+5c8Nt/vzQBtC1a8UeR+/ecMABdVe/pB0FRbYpLQ0D8L79bZg9O+5qpLZKS+Hss+Hhh+HWW+HSS+tu21u3hgtelXdV/etfsGlTWHbwwSEwysOjU6e6e11JeQqKbHTLLfCb38Brr4UuB0kP27fDOefA3/4GN90U/hvWp9JSeP31iq6qefPgk0/Csm9+s2Jv47jjoHPn0IUlGUlBkY02boQOHcIplfffX/P6Er/t28PAyQcegOuvh9/9ruFrKCuD4uKKPY45c2DdurCsbduK0OjTJ+yxSsZQUGSrX/8a/vznMGFgu3Y1ry/xKSsLAybvvRdGj4arr467osAd3nmnIjTmzIHVq6FZs9Devn3cFUod0TTj2erii0PXwrhxcVci1Skrg/PPDyFxzTWpExIQupoOOQR++Uv461/hP/8JB8Q3b4bHH4+7OomBgiLTHHxwOOtp/PiKM10ktbiHS9pOmgRXXgnXXht3RdUzC6dff/e74YwsyToKikxUWBjGVDz4YNyVSGXuYWzEPffA5ZeH6TjS5WBxQUE4Y2rt2rgrkQamoMhEvXvD4YeHqR8y4BhUxnAPYyPGjYPLLoMbbkifkIAwpYi7up+ykIIiE5mFvYolS2DGjLirEQg/sJdeGi5fO2oU3HxzeoUEhMF6Xbuq+ykLKSgy1ZlnhrOedK2K+LmHsRG33x7OSvvTn9IvJMoVFISzoT76KO5KpAElFRRm1sfMlprZcjO7oorlQ82sxMxej27DovbuZjbfzIrNbLGZnZnwnPvM7N2E53SP2s3M7ohea7GZHVFXbzar5OSEA6YvvBDOkZd4uIexEbfeChdeCGPHpm9IgLqfslSNQWFmjYG7gb5AV2CwmXWtYtUp7t49uk2K2jYDQ9z9MKAPMMbMEqez/O+E57wetfUFOke34cD4XXpnEk6/3HPPcKxCGp57OO31xhvDf4s770zvkAA47DA49FCYOjXuSqQBJbNH0QNY7u4r3X0r8DAwIJmNu/s77r4sur8GWAfk1vC0AcADHrwE7GNmGjm2K9q0gSFDwtlPJSVxV5N9rrsujLYeNiwcwM6EWX3Nwl7FP/9ZMXpbMl4y39z2QOJ1NldHbZUNjLqKpppZx8oLzawHkAOsSGi+PnrO7Wa2R21ez8yGm1mRmRWV6Edw50aNCldKu+eeuCvJLn/4QwiKoUPDSPlMCIlyBQVhwOC0aXFXIg0kmW9vVfvKlc+5fAro5O7dgJnADhMNRXsEDwLnuntZ1Pxb4FDgKOAbwOW1eD3cfYK757l7Xm5uTTspWaxLF+jbF+6+OwSG1L//+Z8w2vrss8OgukwKCYDvfS9MFqizn7JGMt/g1UDiHkIHYE3iCu6+3t3Lf4UmAkeWLzOzVsAzwFVRV1L5c9ZG3UtbgHsJXVxJvZ7UUmFhOEvloYfiriTz3XxzGG191llheo7GjeOuqO6Zhb2K2bPh44/jrkYaQDJBsQDobGYHmVkOMAiYnrhCpWMI/YElUXsO8AThmMOjVT3HzAw4HXgzWjQdGBKd/dQT+NTdNRR0d5x4Yph+4fbbNQCvPt12WxhtPWgQ3HdfZoZEuYKCMPOtup+yQo1B4e6lwAhgBiEAHnH3YjMbbWbll1IbGZ0CuwgYCQyN2s8AegNDK58GC/zNzN4A3gDaAH+M2p8FVgLLCXsnF+7um8x6ZuFYxeLFuqhRfRkzJoy2LigIJw80aRJ3RfXr+98PU46r+ykraJrxbPHll+GiND/4ATz1VNzVZJa77goD6X72s3CFuqZN466oYfz2t+FiWR99BK1bx12N7AJNMy472nNPuOACePrpcE0BqRvjx4eQGDAgHAPKlpCAcJrs9u3hOu2S0RQU2eTCC8OI7bFj464kM0yYED7T006DRx4Jn202OeIIOOggdT9lAQVFNmnbFn7+83CgdcOGuKtJb5Mnh9HWp5wSfiizLSSgYvDdzJkV19yWjKSgyDaFheFKZRMnxl1J+rrvvnD1t5NPhscegz32qPEpGaugIFxRUd1PGU1BkW26dYMTTgjzDm3bFnc16efBB+EXvwinHD/xRDj2k83y8uDAAzX3U4ZTUGSjwkL44AP1LdfW//5vmJLjRz8K4wf22ivuiuJX3v30wguwcWPc1Ug9UVBko7594ZBDNACvNqZMCVNy9O4N06dDs2ZxV5Q6CgrC3qlOu85YCops1KgRXHwxFBXBiy/GXU3qe/TRMCVHr17hx7B587grSi09ekDHjtpDzWAKimw1ZAjsu6+ugFeTxx+HwYOhZ0945hlo0SLuilJPeffTjBnw6adxVyP1QEGRrZo3D6d3TpsG774bdzWpwz2cFfbhh2GU9ZlnwlFHwbPPQsuWcVeXugoKYOvWMKBTMo6m8MhmH3wAnTrBiBHpv2dRWgqffw6ffbbjLdm2xPbt2yu226NHOFC7997xvbd0UFYWpojJy9NEgWkk2Sk8MnzmMqlW+/Zwxhlh8Nh110GrVvHW4w6rVsFbb4WrpyXzo15+++KL5F6jefPwPhNv++339baWLUPXXL9+2pNIRqNGofvpnnvCfxt9ZhlFQZHtCgvDaZ+TJ4f7DaGsDP7zHyguDqGQeNu06evrN2369R/ytm3DxXMq/7hX97hFi8ye+jtu+flhepinnw7HdSRjqOtJwimfq1bBsmV1Oz329u3w3nshABJDYcmScByg3P77w2GHQdeu4dalC3ToUPEDn80jn9NJWVn479azZzgJQFKeup4keYWFYYrsadPCX4W1VVoKK1fuuGdQXAxvvx2mNy/Xvn0Igl/+siIYunSBb3yj7t6LxKdRIxg4MFz+ddMmnSGWQbRHIeEv/+98J/xlX924im3bYMWKr3cZLV264/W4O3bccQ+h/KYDwplv7lw47riKM8YkpWmPQpLXuDGMHBmugvfKK9C9e+iGStw7eOutcB2LxPmhOnUKAfCTn1QEw6GHxn9QXOLTq1f4g2PqVAVFBtEehQSffx76lxs1Ct0GpaWh3Sxcc6DyHkKXLhqhLFW76CK4914oKdF3JMVpj0Jqp2VLuO22MPq4S5eKYDjkEM1rJLVTUADjxsFzz+3aMS9JOdqjEJG6tX07HHAAHH98mExRUladXjPbzPqY2VIzW25mV1SxfKiZlZjZ69FtWNTe3czmm1mxmS02szMTnvO3aJtvmtlfzKxp1H68mX2asK1rkn/bIhK7xo3DWXRPP73jadCStmoMCjNrDNwN9AW6AoPNrGsVq05x9+7RbVLUthkY4u6HAX2AMWa2T7Tsb8ChwPeAvYBhCdual7Ct0bv0zkQkPvn5ISSefz7uSqQOJLNH0QNY7u4r3X0r8DAwIJmNu/s77r4sur8GWAfkRo+f9QjwCtBhV96AiKSg446DNm009XiGSCYo2gOrEh6vjtoqGxh1L001s46VF5pZDyAHWFGpvSlwNpD4p8fRZrbIzJ4zs8OSqFFEUkmTJhXdT8nOwyUpK5mgsCraKh8Bfwro5O7dgJnA/TtswKwd8CBwrruXVXruOGCuu8+LHr8KHOju3wfuBKqcitLMhptZkZkVlZSUJPE2RKRB5eeHU61nzIi7EtlNyQTFaiBxD6EDsCZxBXdf7+7lQ3MnAkeWLzOzVsAzwFXu/lLi88zs94SuqEsStvWZu2+K7j8LNDWzNpWLcvcJ7p7n7nm5ublJvA0RaVDHHw+tW6v7KQMkExQLgM5mdpCZ5QCDgOmJK0R7DOX6A0ui9hzgCeABd3+00nOGAScDgxP3MsxsfzOz6H6PqMb1tX1jIhKzpk3h9NPD5WMT5/yStFNjULh7KTACmEEIgEfcvdjMRptZ/2i1kdEpsIuAkcDQqP0MoDcwNOF01+7RsnuAtsD8SqfB5gNvRtu6AxjkmTDYQyQbFRSEUf8vvBB3JbIbNOBOROrPtm3h2iH9+sEDD8RdjVRSpwPuRER2SXn305NP7jjDsKQVBYWI1K+CgnC52pkz465EdpGCQkTq1wknwD776OynNKagEJH6lZMDAwaE7qetW+OuRnaBgkJE6l9+PmzcCLNmxV2J7AIFhYjUv5NOClc+VPdTWlJQiEj922OP0P00bdqOl9OVtKCgEJGGkZ8Pn3wC//hH3JVILSkoRKRh/OQn4ZK76n5KOwoKEWkYe+4Jp50GTzyh7qc0o6AQkYZTUAAbNsA//xl3JVILCgoRaTgnnwwtWsDUqXFXIrWgoBCRhrPXXmGCwMcfh9LSuKuRJCkoRKRhFRTAxx/D3LlxVyJJUlCISMPq0weaNdPZT2lEQSEiDatZs4rup+3b465GkqCgEJGGl58P69bBvHlxVyJJUFCISMM75ZRwYFvdT2lBQSEiDa958xAW6n5KCwoKEYlHQQF8+CG8+GLclUgNFBQiEo9TTw3Teqj7KeUlFRRm1sfMlprZcjO7oorlQ82sxMxej27DovbuZjbfzIrNbLGZnZnwnIPM7GUzW2ZmU8wsJ2rfI3q8PFreqW7eqoiklBYtoG9feOwxKCuLuxqpRo1BYWaNgbuBvkBXYLCZda1i1Snu3j26TYraNgND3P0woA8wxsz2iZbdBNzu7p2BT4DzovbzgE/c/WDg9mg9EclEBQWwdi38+99xVyLVSGaPogew3N1XuvtW4GFgQDIbd/d33H1ZdH8NsA7INTMDfgyUT/hyP3B6dH9A9Jho+QnR+iKSafr1Cxc10txPKS2ZoGgPrEp4vDpqq2xg1L001cw6Vl5oZj2AHGAF0BrY6O7lk70kbvOr14uWfxqtX3l7w82syMyKSkpKkngbIpJyWrYMI7WnTlX3UwpLJiiq+mveKz1+Cujk7t2AmVTsEYQNmLUDHgTOdfeyGraZzOvh7hPcPc/d83Jzc2t4CyKSsgoK4IMP4OWX465EdiKZoFgNJO4hdADWJK7g7uvdfUv0cCJwZPkyM2sFPANc5e4vRc0fA/uYWZMqtvnV60XL9wY2JPuGRCTN9OsHOTk6+ymFJRMUC4DO0VlKOcAgYHriCtEeQ7n+wJKoPQd4AnjA3b/6Fri7A7OB/KjpHODJ6P706DHR8n9E64tIJtp773CdCnU/pawagyI6TjACmEEIgEfcvdjMRptZ/2i1kdEpsIuAkcDQqP0MoDcwNOHU2e7RssuBS8xsOeEYxOSofTLQOmq/BPja6bgikmHy82HVKliwIO5KpAqWCX+s5+XleVFRUdxliMiu2rgR9tsPRo6EW2+Nu5qazZ4dQq1Jk4pb06ZV36+rZY0b1/nbMLOF7p5X03pNalpBRKTe7bMPnHRS6H665RZI1TPiy8rgD3+Aa69t+Nc2qzpERo6EK6+s15dWUIhIaigogGefhaIiOOqouKv5uk2bYOjQMJL87LNh7Fho1Chc0nXbtvBv+a0+H1de1rWq8c91S0EhIqlhwIDwF/Kjj6ZeULz3XqjvzTfhttugsDB193rqgSYFFJHUsO++cOKJofsplY6dzpkTguv998MezyWXZFVIgIJCRFJJQQG8+y68+mrclQTjx4fwat0aXnklnMabhRQUIpI6BgwIZ/fEPffT1q1wwQVw4YXwk5+EUePf+U68NcVIQSEiqaN1azjhhHCcIq7up5KScAbWPffA5ZfD9OlhUGAWU1CISGrJz4cVK2DRooZ/7UWLwvGIV16Bv/4VbryxXsYvpBsFhYiklp/+NPw4N/TcT1OnwjHHhFNO582Ds85q2NdPYQoKEUktbdrAj37UcN1PZWVwzTXhQHq3bmHEdV6Ng5WzioJCRFJPfj4sWwZvvFG/r/P55zBwYBhtfe658M9/Qrt2NT4t2ygoRCT1/PSnYdRzfXY/rVwZupqmT4cxY2Dy5HC1PfkaBYWIpJ799oPjjqu/7qd//CMctP7gA3j+ebj44qwbRFcbCgoRSU0FBbB0KRQX19023eGuu8LYiLZtw9lNJ51Ud9vPUAoKEUlNP/tZ+Cu/rrqftm6F4cPh17+GU06Bl16Cgw+um21nOAWFiKSmtm2hd++6GaW9bl0YyDdpEvzudzBtGrRqtfvbzRIKChFJXQUF8NZb4barXnstnO66cCE89BBcf304UC5J06clIqmrvPtpV/cqpkyBXr3CsYl//QsGDarb+rKEgkJEUle7dvDDH9b+OEVZGVx1VQiGI44IF0M64oj6qTELKChEJLXl54cLBr39dnLrf/YZnH566GI67zyYNSsc75BdllRQmFkfM1tqZsvN7Ioqlg81sxLBDkuXAAAGgUlEQVQzez26DUtY9ryZbTSzpys9Z17C+mvMbFrUfryZfZqw7JrdfZMiksYGDgz/JtP9tHw5HH10uMDQnXfCxIkaRFcHarwUqpk1Bu4GTgJWAwvMbLq7Vz66NMXdR1SxiVuAZsD5iY3ufmzCazwGPJmweJ6790vuLYhIRmvfPhxnePTR0J20MzNnwhlnhGMaL7wAP/5xw9WY4ZLZo+gBLHf3le6+FXgYGJDsC7j7LODznS03s5bAj4FpyW5TRLJMfj4sXgzvvPP1Ze4wdiz06QMHHBAm9VNI1KlkgqI9sCrh8eqorbKBZrbYzKaaWcda1PBTYJa7f5bQdrSZLTKz58zssFpsS0Qy0c66n7ZsCcchRo2Cfv1g/nz41rcavr4Ml0xQVDUBSuXJV54COrl7N2AmcH8tahgMPJTw+FXgQHf/PnAnO9nTMLPhZlZkZkUlJSW1eDkRSTsdO0LPnjsGxYcfhunI770Xrr4aHn8cWraMr8YMlkxQrAYS9xA6AGsSV3D39e6+JXo4ETgymRc3s9aErq1nErb1mbtviu4/CzQ1szaVn+vuE9w9z93zcnNzk3k5EUlnBQVh8NyKFeF016OOCleke+QRGD1ag+jqUTKf7AKgs5kdZGY5wCBgeuIKZpY4gXt/YEmSr18APO3uXyZsa3+zMI2jmfWIalyf5PZEJFOVdz9ddBEce2wIhhdfDAEi9arGs57cvdTMRgAzgMbAX9y92MxGA0XuPh0YaWb9gVJgAzC0/PlmNg84FGhhZquB89x9RrR4EHBjpZfMBy4ws1LgC2CQe1xXWReRlHHggdCjB8yYEYJi6tQwHbnUO8uE3+C8vDwvKiqKuwwRqW9z54apOC67DHJy4q4m7ZnZQnev8bqvNe5RiIikjN69w00alI7+iIhItRQUIiJSLQWFiIhUS0EhIiLVUlCIiEi1FBQiIlItBYWIiFRLQSEiItXKiJHZZlYCvB93HbupDfBx3EWkEH0eO9LnUUGfxY525/M40N1rnFU1I4IiE5hZUTJD6bOFPo8d6fOooM9iRw3xeajrSUREqqWgEBGRaikoUseEuAtIMfo8dqTPo4I+ix3V++ehYxQiIlIt7VGIiEi1FBQxMLOOZjbbzJaYWbGZXRy1f8PM/m5my6J/94271oZiZo3N7DUzezp6fJCZvRx9FlOiy/BmBTPbx8ymmtnb0Xfk6Cz/bhRG/5+8aWYPmdme2fL9MLO/mNk6M3szoa3K74IFd5jZcjNbbGZH1FUdCop4lAKXunsXoCdwkZl1Ba4AZrl7Z2BW9DhbXMyO11q/Cbg9+iw+Ac6Lpap4jAWed/dDge8TPpes/G6YWXtgJJDn7t8lXI55ENnz/bgP6FOpbWffhb5A5+g2HBhfV0UoKGLg7mvd/dXo/ueEH4L2wADg/mi1+4HT46mwYZlZB+BUYFL02IAfA1OjVbLps2gF9AYmA7j7VnffSJZ+NyJNgL3MrAnQDFhLlnw/3H0usKFS886+CwOABzx4CdjHzNrVRR0KipiZWSfgcOBloK27r4UQJkC2XDl+DPAboCx63BrY6O6l0ePVhCDNBt8CSoB7o664SWbWnCz9brj7B8CtwH8IAfEpsJDs/X7Azr8L7YFVCevV2eeioIiRmbUAHgNGuftncdcTBzPrB6xz94WJzVWsmi2n5zUBjgDGu/vhwP+RJd1MVYn63wcABwEHAM0JXSyVZcv3ozr19v+NgiImZtaUEBJ/c/fHo+aPyncVo3/XxVVfA+oF9Dez94CHCV0KYwi7zU2idToAa+Ipr8GtBla7+8vR46mE4MjG7wbAicC77l7i7tuAx4FjyN7vB+z8u7Aa6JiwXp19LgqKGER98JOBJe7+p4RF04FzovvnAE82dG0Nzd1/6+4d3L0T4SDlP9z9LGA2kB+tlhWfBYC7fwisMrNDoqYTgLfIwu9G5D9ATzNrFv1/U/55ZOX3I7Kz78J0YEh09lNP4NPyLqrdpQF3MTCzHwLzgDeo6Jf/HeE4xSPANwn/gxS4e+UDWRnLzI4HLnP3fmb2LcIexjeA14D/cvctcdbXUMysO+HAfg6wEjiX8EddVn43zOw64EzC2YKvAcMIfe8Z//0ws4eA4wkzxH4E/B6YRhXfhShI7yKcJbUZONfdi+qkDgWFiIhUR11PIiJSLQWFiIhUS0EhIiLVUlCIiEi1FBQiIlItBYWIiFRLQSEiItVSUIiISLX+H9OKNRaYCXCTAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制不同PCA维数下模型的性能，找到最佳模型／参数（分数最高）\n",
    "plt.plot(Ks, np.array(CH_scores), 'r-')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "K=10的时候，得到了较好的聚类结果，计算时间略长。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 显示聚类结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEJCAYAAACdePCvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADbVJREFUeJzt3X+s3XV9x/HnS6pumSaU9UIIrdaYxsmWgeymkvEPcwsWtgydIcFs2hmS7g9YMNEssH/YMMv8Y3OLmZJ0k1Ayp2FRY2OIrOkw7I+p3DpEWGU0iHBtQ6+roouJC+69P863eqH39v7kntP7fj6Sm/M9n/s553zON3CfPd/zK1WFJKmfV4x7AZKk8TAAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKa2jLuBZzNtm3baufOneNehiSdU44cOfLdqppaat5EB2Dnzp3MzMyMexmSdE5J8u3lzPMQkCQ1ZQAkqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKaMgCS1JQBkKSmDIAkNWUAJKkpAyBJTRkASWrKAEhSU0sGIMmOJA8mOZrk8SS3DuMXJDmU5MnhdOswniQfTXIsyaNJrph3XXuH+U8m2fvy3S1J0lKW8wjgBeADVfVm4Erg5iSXArcBh6tqF3B4OA9wLbBr+NkH3AWjYAB3AG8FdgN3nI6GJGnjLRmAqjpRVV8btn8IHAUuAa4HDgzTDgDvGLavB+6tkS8D5ye5GHg7cKiqTlXV94BDwJ51vTeSpGVb0XMASXYCbwG+AlxUVSdgFAngwmHaJcCz8y42O4wtNi5JGoNlByDJa4DPAO+vqh+cbeoCY3WW8Zfezr4kM0lm5ubmlrs8SdIKLSsASV7J6I//J6vqs8Pwc8OhHYbTk8P4LLBj3sW3A8fPMv4iVbW/qqaranpqamol90WStALLeRVQgE8AR6vqI/N+dRA4/UqevcDn542/d3g10JXA88MhogeAa5JsHZ78vWYYkySNwZZlzLkKeA/wjSSPDGN/CnwYuC/JTcAzwA3D7+4HrgOOAT8C3gdQVaeSfAh4eJh3Z1WdWpd7IUlasVSdcRh+YkxPT9fMzMy4lyFJ55QkR6pqeql5vhNYkpoyAJLUlAGQpKYMgCQ1ZQAkqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKaMgCS1JQBkKSmDIAkNWUAJKkpAyBJTRkASWrKAEhSUwZAkpoyAJLUlAGQpKYMgCQ1ZQAkqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpqSUDkOTuJCeTPDZv7M+SfCfJI8PPdfN+d3uSY0meSPL2eeN7hrFjSW5b/7siSVqJ5TwCuAfYs8D431TV5cPP/QBJLgVuBH55uMzHk5yX5DzgY8C1wKXAu4e5kqQx2bLUhKp6KMnOZV7f9cCnq+rHwLeSHAN2D787VlVPAST59DD3P1e8YknSuljLcwC3JHl0OES0dRi7BHh23pzZYWyx8TMk2ZdkJsnM3NzcGpYnSTqb1QbgLuCNwOXACeCvh/EsMLfOMn7mYNX+qpququmpqalVLk+StJQlDwEtpKqeO72d5O+BLwxnZ4Ed86ZuB44P24uNS5LGYFWPAJJcPO/sO4HTrxA6CNyY5NVJ3gDsAr4KPAzsSvKGJK9i9ETxwdUvW5K0Vks+AkjyKeBqYFuSWeAO4OoklzM6jPM08EcAVfV4kvsYPbn7AnBzVf1kuJ5bgAeA84C7q+rxdb83kqRlS9WCh+InwvT0dM3MzIx7GZJ0TklypKqml5rnO4ElqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKaMgCS1JQBkKSmDIAkNWUAJKkpAyBJTRkASWrKAEhSUwZAkpoyAJLUlAGQpKYMgCQ1ZQAkqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlNLBiDJ3UlOJnls3tgFSQ4leXI43TqMJ8lHkxxL8miSK+ZdZu8w/8kke1+euyNJWq7lPAK4B9jzkrHbgMNVtQs4PJwHuBbYNfzsA+6CUTCAO4C3AruBO05HQ5I0HksGoKoeAk69ZPh64MCwfQB4x7zxe2vky8D5SS4G3g4cqqpTVfU94BBnRkWStIFW+xzARVV1AmA4vXAYvwR4dt682WFssfEzJNmXZCbJzNzc3CqXJ0layno/CZwFxuos42cOVu2vqumqmp6amlrXxUmSfma1AXhuOLTDcHpyGJ8Fdsybtx04fpZxSdKYrDYAB4HTr+TZC3x+3vh7h1cDXQk8PxwiegC4JsnW4cnfa4YxSdKYbFlqQpJPAVcD25LMMno1z4eB+5LcBDwD3DBMvx+4DjgG/Ah4H0BVnUryIeDhYd6dVfXSJ5YlSRsoVQseip8I09PTNTMzM+5lSNI5JcmRqppeap7vBJakpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKaMgCS1JQBkKSmDIAkNWUAJKkpAyBJTRkASWrKAEhSUwZAkpoyAJLUlAGQpKYMgCQ1ZQAkqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKaMgCS1JQBkKSmDIAkNWUAJKkpAyBJTa0pAEmeTvKNJI8kmRnGLkhyKMmTw+nWYTxJPprkWJJHk1yxHndAkrQ66/EI4Deq6vKqmh7O3wYcrqpdwOHhPMC1wK7hZx9w1zrctiRplV6OQ0DXAweG7QPAO+aN31sjXwbOT3Lxy3D7kqRlWGsACviXJEeS7BvGLqqqEwDD6YXD+CXAs/MuOzuMSZLGYMsaL39VVR1PciFwKMk3zzI3C4zVGZNGIdkH8LrXvW6Ny5MkLWZNjwCq6vhwehL4HLAbeO70oZ3h9OQwfRbYMe/i24HjC1zn/qqarqrpqamptSxPknQWqw5Akl9I8trT28A1wGPAQWDvMG0v8Plh+yDw3uHVQFcCz58+VCRJ2nhrOQR0EfC5JKev55+q6otJHgbuS3IT8AxwwzD/fuA64BjwI+B9a7htSdIarToAVfUUcNkC4/8N/OYC4wXcvNrbkyStL98JLElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKaMgCS1JQBkKSmDIAkNWUAJKkpAyBJTRkASWrKAEhSUwZAkpoyAJLUlAGQpKYMgCQ1ZQAkqSkDIElNGQBJasoASFJTBkCSmjIAktSUAZCkpgyAJDVlACSpKQMgSU0ZAElqygBIUlMGQJKa2vAAJNmT5Ikkx5LcttG3L0ka2dAAJDkP+BhwLXAp8O4kl27kGiRJIxv9CGA3cKyqnqqq/wU+DVy/wWuQJLHxAbgEeHbe+dlhTJK0wTY6AFlgrF40IdmXZCbJzNzc3AYtS5L62egAzAI75p3fDhyfP6Gq9lfVdFVNT01NbejiJKmTjQ7Aw8CuJG9I8irgRuDgBq9BkgRs2cgbq6oXktwCPACcB9xdVY9v5BokSSMbGgCAqrofuH+jb1eS9GK+E1iSmjIAktSUAZCkpgyAJDVlACSpqVTV0rPGJMkc8O01XMU24LvrtJzNxn2zMPfL4tw3i5u0ffP6qlrynbQTHYC1SjJTVdPjXsckct8szP2yOPfN4s7VfeMhIElqygBIUlObPQD7x72ACea+WZj7ZXHum8Wdk/tmUz8HIEla3GZ/BCBJWsSmDIBfPL+wJHcnOZnksXGvZdIk2ZHkwSRHkzye5NZxr2lSJPm5JF9N8vVh3/z5uNc0SZKcl+Q/knxh3GtZqU0XAL94/qzuAfaMexET6gXgA1X1ZuBK4Gb/u/mpHwNvq6rLgMuBPUmuHPOaJsmtwNFxL2I1Nl0A8IvnF1VVDwGnxr2OSVRVJ6rqa8P2Dxn9D+33VQM18j/D2VcOPz55CCTZDvw28A/jXstqbMYA+MXzWpMkO4G3AF8Z70omx3CY4xHgJHCoqtw3I38L/Anwf+NeyGpsxgAs+cXz0mKSvAb4DPD+qvrBuNczKarqJ1V1OaPv8d6d5FfGvaZxS/I7wMmqOjLutazWZgzAkl88Ly0kySsZ/fH/ZFV9dtzrmURV9X3gS/hcEsBVwO8meZrRoea3JfnH8S5pZTZjAPziea1YkgCfAI5W1UfGvZ5JkmQqyfnD9s8DvwV8c7yrGr+qur2qtlfVTkZ/Z/61qv5gzMtakU0XgKp6ATj9xfNHgfv84vmRJJ8C/h14U5LZJDeNe00T5CrgPYz+FffI8HPduBc1IS4GHkzyKKN/YB2qqnPuJY86k+8ElqSmNt0jAEnS8hgASWrKAEhSUwZAkpoyAJI0IVbygY1JXp/kcJJHk3xp+FiKFTEAkjQ57mH5b7L7K+DeqvpV4E7gL1d6YwZAkibEQh/YmOSNSb6Y5EiSf0vyS8OvLgUOD9sPsooPvTQAkjTZ9gN/XFW/BnwQ+Pgw/nXgXcP2O4HXJvnFlVzxlnVboiRpXQ0fTvjrwD+PPq0EgFcPpx8E/i7JHwIPAd9h9L0Wy2YAJGlyvQL4/vBJrC9SVceB34OfhuJdVfX8Sq9ckjSBho8k/1aSG2D0oYVJLhu2tyU5/Tf8duDulV6/AZCkCbHIBzb+PnBTkq8Dj/OzJ3uvBp5I8l/ARcBfrPj2/DA4SerJRwCS1JQBkKSmDIAkNWUAJKkpAyBJTRkASWrKAEhSUwZAkpr6fzJENiHUpsXrAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#画出聚类结果，每一类用一种颜色\n",
    "mark = ['#1f77b4','#2ca02c','#d62728','#000000','#17becf','#8c564b','#bcbd22','#ff7f0e','#524C90','#845868']\n",
    "n_clusters = 10\n",
    "mb_kmeans = MiniBatchKMeans(n_clusters = n_clusters)#构造聚类器\n",
    "mb_kmeans.fit(X_train)\n",
    "\n",
    "y_train_pred = mb_kmeans.labels_#获取聚类标签\n",
    "cents = mb_kmeans.cluster_centers_#质心#获取聚类中心\n",
    "\n",
    "X_train = np.array(X_train)#为避免在python3.X中循环体内出现不可哈希错误\n",
    "j=0\n",
    "for i in y_train_pred:\n",
    "    plt.plot([X_train[j:j+1,0]],[X_train[j:j+1,1]],mark[i],markersize=5)\n",
    "    j+=1\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
